Statistical Optimization of Acoustic Models for Large Vocabulary Speech Recognition
نویسندگان
چکیده
This dissertation investigates statistical optimization of acoustic models in speech recognition. Two new optimization methods are proposed for phonetic decision tree (PDT) search and Hidden Markov modeling (HMM)-the knowledge-based adaptive PDT algorithm and the HMM gradient boosting algorithm. Investigations are conducted to applying both methods to improve word error rate of the state-of-the-art speech recognition system. However, these two methods are developed in a general machine learning background and their applications are not limited to speech recognition. The HMM gradient boosting method is based on a function approximation scheme from the perspective of optimization in function space rather than the parameter space, based on the fact that the Gaussian mixture model in each HMM state is an additive model of homogeneous functions (Gaussians). It provides a new scheme which can jointly optimize model structure and parameters. Experiments are conducted on the World Street Journal (WSJ) task and good improvements on word error rate are observed. The knowledge-based adaptive PDT algorithm is developed under a trend toward knowledge-based systems and aims at optimizing the mapping from contextual phones to articulatory states by maximizing implicit usage of the phonological and phonetic information, which is presumed to be contained in large data corpus. A computational efficient algorithm is developed to incorporate this prior knowledge in PDT construction. This algorithm is evaluated on the Telehealth conversational speech recognition and significant improvement on system performance is achieved.
منابع مشابه
Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملLarge Vocabulary Continuous Speech Recognition
Large vocabulary speaker-independent speech recognition systems being capable of recognizing continuous speech based on hidden Markov models are today’s standard. This review introduces the fundamentals of speech and the underlying speech recognition problems. The three classical approaches, i.e., the acoustic-phonetic, the statistical (pattern) recognition and the artificial intelligence appro...
متن کاملContinuous Speech Recognition Based on Deterministic Finite Automata Machine using Utterance and Pitch Verification
This paper introduces a set of acoustic modeling techniques for utterance verification (UV) based continuous speech recognition (CSR). Utterance verification in this work implies the ability to determine when portions of a hypothesized word string correspond to incorrectly decoded vocabulary words or out-of-vocabulary words that may appear in an utterance. This capability is implemented here as...
متن کاملGeneralized discriminative feature transformation for speech recognition
We propose a new algorithm called Generalized Discriminative Feature Transformation (GDFT) for acoustic models in speech recognition. GDFT is based on Lagrange relaxation on a transformed optimization problem. We show that the existing discriminative feature transformation methods like feature space MMI/MPE (fMMI/MPE), region dependent linear transformation (RDLT), and a non-discriminative feat...
متن کاملMicrosoft Word - Hybridmodel2.dot
Today’s state-of-the-art speech recognition systems typically use continuous density hidden Markov models with mixture of Gaussian distributions. Such speech recognition systems have problems; they require too much memory to run, and are too slow for large vocabulary applications. Two approaches are proposed for the design of compact acoustic models, namely, subspace distribution clustering hid...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006